Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] FIx non-tensor writing in modules #822

Merged
merged 1 commit into from
Jun 19, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 19, 2024

Fixes #821

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 19, 2024
@vmoens vmoens added the bug Something isn't working label Jun 19, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 51.2960μs 17.3266μs 57.7146 KOps/s 58.1043 KOps/s $\color{#d91a1a}-0.67\%$
test_plain_set_stack_nested 38.1910μs 17.3886μs 57.5089 KOps/s 57.5841 KOps/s $\color{#d91a1a}-0.13\%$
test_plain_set_nested_inplace 47.4080μs 19.2359μs 51.9862 KOps/s 51.6498 KOps/s $\color{#35bf28}+0.65\%$
test_plain_set_stack_nested_inplace 59.4010μs 19.3948μs 51.5601 KOps/s 52.3968 KOps/s $\color{#d91a1a}-1.60\%$
test_items 22.5820μs 2.4830μs 402.7352 KOps/s 393.6284 KOps/s $\color{#35bf28}+2.31\%$
test_items_nested 0.4648ms 0.2669ms 3.7462 KOps/s 3.6695 KOps/s $\color{#35bf28}+2.09\%$
test_items_nested_locked 0.8178ms 0.2689ms 3.7187 KOps/s 3.6301 KOps/s $\color{#35bf28}+2.44\%$
test_items_nested_leaf 0.1647ms 76.0032μs 13.1573 KOps/s 12.8023 KOps/s $\color{#35bf28}+2.77\%$
test_items_stack_nested 0.9994ms 0.2794ms 3.5789 KOps/s 3.6052 KOps/s $\color{#d91a1a}-0.73\%$
test_items_stack_nested_leaf 0.1416ms 78.5608μs 12.7290 KOps/s 12.6895 KOps/s $\color{#35bf28}+0.31\%$
test_items_stack_nested_locked 0.8476ms 0.2739ms 3.6513 KOps/s 3.6166 KOps/s $\color{#35bf28}+0.96\%$
test_keys 23.3840μs 3.8121μs 262.3226 KOps/s 247.9636 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_keys_nested 0.1941ms 0.1358ms 7.3626 KOps/s 7.3228 KOps/s $\color{#35bf28}+0.54\%$
test_keys_nested_locked 0.6929ms 0.1403ms 7.1254 KOps/s 7.0606 KOps/s $\color{#35bf28}+0.92\%$
test_keys_nested_leaf 0.5103ms 0.1151ms 8.6908 KOps/s 8.5631 KOps/s $\color{#35bf28}+1.49\%$
test_keys_stack_nested 0.3402ms 0.1355ms 7.3774 KOps/s 7.3545 KOps/s $\color{#35bf28}+0.31\%$
test_keys_stack_nested_leaf 0.2158ms 0.1148ms 8.7131 KOps/s 8.6186 KOps/s $\color{#35bf28}+1.10\%$
test_keys_stack_nested_locked 0.2117ms 0.1387ms 7.2083 KOps/s 7.0727 KOps/s $\color{#35bf28}+1.92\%$
test_values 11.2368μs 1.1768μs 849.7637 KOps/s 851.4576 KOps/s $\color{#d91a1a}-0.20\%$
test_values_nested 0.1020ms 50.4539μs 19.8201 KOps/s 19.4593 KOps/s $\color{#35bf28}+1.85\%$
test_values_nested_locked 0.1044ms 50.8142μs 19.6796 KOps/s 19.4699 KOps/s $\color{#35bf28}+1.08\%$
test_values_nested_leaf 94.7570μs 45.6986μs 21.8825 KOps/s 21.6770 KOps/s $\color{#35bf28}+0.95\%$
test_values_stack_nested 0.1028ms 51.2472μs 19.5133 KOps/s 18.9034 KOps/s $\color{#35bf28}+3.23\%$
test_values_stack_nested_leaf 98.9540μs 45.5557μs 21.9511 KOps/s 21.6241 KOps/s $\color{#35bf28}+1.51\%$
test_values_stack_nested_locked 93.0940μs 50.8402μs 19.6695 KOps/s 18.9190 KOps/s $\color{#35bf28}+3.97\%$
test_membership 21.8010μs 1.3335μs 749.9066 KOps/s 732.8677 KOps/s $\color{#35bf28}+2.32\%$
test_membership_nested 26.5290μs 3.3823μs 295.6550 KOps/s 267.5486 KOps/s $\textbf{\color{#35bf28}+10.51\%}$
test_membership_nested_leaf 27.1600μs 3.4143μs 292.8861 KOps/s 269.1545 KOps/s $\textbf{\color{#35bf28}+8.82\%}$
test_membership_stacked_nested 19.8170μs 3.3611μs 297.5218 KOps/s 284.0678 KOps/s $\color{#35bf28}+4.74\%$
test_membership_stacked_nested_leaf 19.8770μs 3.3782μs 296.0153 KOps/s 285.5946 KOps/s $\color{#35bf28}+3.65\%$
test_membership_nested_last 24.9660μs 4.1327μs 241.9740 KOps/s 233.5604 KOps/s $\color{#35bf28}+3.60\%$
test_membership_nested_leaf_last 43.9020μs 4.1522μs 240.8333 KOps/s 234.5438 KOps/s $\color{#35bf28}+2.68\%$
test_membership_stacked_nested_last 43.5510μs 4.6825μs 213.5633 KOps/s 237.1471 KOps/s $\textbf{\color{#d91a1a}-9.94\%}$
test_membership_stacked_nested_leaf_last 27.8310μs 4.7265μs 211.5710 KOps/s 234.9089 KOps/s $\textbf{\color{#d91a1a}-9.93\%}$
test_nested_getleaf 31.9300μs 10.4915μs 95.3151 KOps/s 95.3319 KOps/s $\color{#d91a1a}-0.02\%$
test_nested_get 32.9710μs 10.0042μs 99.9578 KOps/s 100.7190 KOps/s $\color{#d91a1a}-0.76\%$
test_stacked_getleaf 29.9650μs 10.3890μs 96.2553 KOps/s 95.6802 KOps/s $\color{#35bf28}+0.60\%$
test_stacked_get 32.6910μs 9.9788μs 100.2122 KOps/s 102.0514 KOps/s $\color{#d91a1a}-1.80\%$
test_nested_getitemleaf 34.1440μs 10.9976μs 90.9290 KOps/s 89.4364 KOps/s $\color{#35bf28}+1.67\%$
test_nested_getitem 33.9530μs 10.1910μs 98.1255 KOps/s 97.2447 KOps/s $\color{#35bf28}+0.91\%$
test_stacked_getitemleaf 35.5460μs 11.0451μs 90.5381 KOps/s 91.1326 KOps/s $\color{#d91a1a}-0.65\%$
test_stacked_getitem 34.3240μs 10.2460μs 97.5995 KOps/s 98.6046 KOps/s $\color{#d91a1a}-1.02\%$
test_lock_nested 50.8784ms 0.3858ms 2.5917 KOps/s 2.9240 KOps/s $\textbf{\color{#d91a1a}-11.36\%}$
test_lock_stack_nested 0.5149ms 0.3044ms 3.2847 KOps/s 3.2038 KOps/s $\color{#35bf28}+2.53\%$
test_unlock_nested 0.7039ms 0.3466ms 2.8850 KOps/s 2.8331 KOps/s $\color{#35bf28}+1.83\%$
test_unlock_stack_nested 0.4422ms 0.3112ms 3.2138 KOps/s 3.1308 KOps/s $\color{#35bf28}+2.65\%$
test_flatten_speed 0.2063ms 95.9659μs 10.4204 KOps/s 10.4535 KOps/s $\color{#d91a1a}-0.32\%$
test_unflatten_speed 0.6859ms 0.4002ms 2.4990 KOps/s 2.4275 KOps/s $\color{#35bf28}+2.95\%$
test_common_ops 5.1033ms 0.7295ms 1.3709 KOps/s 1.3554 KOps/s $\color{#35bf28}+1.14\%$
test_creation 33.3520μs 1.8869μs 529.9599 KOps/s 520.3709 KOps/s $\color{#35bf28}+1.84\%$
test_creation_empty 33.1020μs 11.7519μs 85.0928 KOps/s 94.2737 KOps/s $\textbf{\color{#d91a1a}-9.74\%}$
test_creation_nested_1 39.3140μs 14.1291μs 70.7758 KOps/s 74.8154 KOps/s $\textbf{\color{#d91a1a}-5.40\%}$
test_creation_nested_2 38.6120μs 17.4364μs 57.3512 KOps/s 60.5435 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_clone 1.2676ms 12.8525μs 77.8057 KOps/s 73.4516 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_getitem[int] 31.9390μs 11.3638μs 87.9988 KOps/s 84.1891 KOps/s $\color{#35bf28}+4.53\%$
test_getitem[slice_int] 59.9110μs 22.2946μs 44.8539 KOps/s 43.4194 KOps/s $\color{#35bf28}+3.30\%$
test_getitem[range] 78.7670μs 58.4610μs 17.1054 KOps/s 16.0958 KOps/s $\textbf{\color{#35bf28}+6.27\%}$
test_getitem[tuple] 42.8200μs 18.4920μs 54.0774 KOps/s 51.3868 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_getitem[list] 0.1055ms 40.7032μs 24.5681 KOps/s 23.2442 KOps/s $\textbf{\color{#35bf28}+5.70\%}$
test_setitem_dim[int] 0.1022ms 34.7980μs 28.7373 KOps/s 29.0665 KOps/s $\color{#d91a1a}-1.13\%$
test_setitem_dim[slice_int] 99.4160μs 63.6140μs 15.7198 KOps/s 16.2926 KOps/s $\color{#d91a1a}-3.52\%$
test_setitem_dim[range] 0.2074ms 83.2905μs 12.0062 KOps/s 11.8679 KOps/s $\color{#35bf28}+1.17\%$
test_setitem_dim[tuple] 0.1119ms 50.0015μs 19.9994 KOps/s 20.0427 KOps/s $\color{#d91a1a}-0.22\%$
test_setitem 68.4880μs 20.2920μs 49.2805 KOps/s 50.0193 KOps/s $\color{#d91a1a}-1.48\%$
test_set 60.3320μs 19.6342μs 50.9316 KOps/s 50.5923 KOps/s $\color{#35bf28}+0.67\%$
test_set_shared 3.3922ms 0.1418ms 7.0545 KOps/s 6.7874 KOps/s $\color{#35bf28}+3.94\%$
test_update 0.1015ms 22.5615μs 44.3233 KOps/s 44.5708 KOps/s $\color{#d91a1a}-0.56\%$
test_update_nested 92.2020μs 31.0768μs 32.1784 KOps/s 31.2925 KOps/s $\color{#35bf28}+2.83\%$
test_update__nested 65.0310μs 24.7442μs 40.4135 KOps/s 39.3452 KOps/s $\color{#35bf28}+2.72\%$
test_set_nested 80.3600μs 21.5016μs 46.5081 KOps/s 46.0378 KOps/s $\color{#35bf28}+1.02\%$
test_set_nested_new 85.7400μs 25.4657μs 39.2685 KOps/s 38.4853 KOps/s $\color{#35bf28}+2.04\%$
test_select 0.1313ms 41.0137μs 24.3821 KOps/s 24.2402 KOps/s $\color{#35bf28}+0.59\%$
test_select_nested 0.1201ms 59.6835μs 16.7551 KOps/s 16.1059 KOps/s $\color{#35bf28}+4.03\%$
test_exclude_nested 0.2322ms 0.1200ms 8.3339 KOps/s 8.1318 KOps/s $\color{#35bf28}+2.49\%$
test_empty[True] 0.8619ms 0.4036ms 2.4779 KOps/s 2.4576 KOps/s $\color{#35bf28}+0.83\%$
test_empty[False] 6.9177μs 1.1844μs 844.3199 KOps/s 834.9825 KOps/s $\color{#35bf28}+1.12\%$
test_unbind_speed 0.3405ms 0.2525ms 3.9598 KOps/s 3.8775 KOps/s $\color{#35bf28}+2.12\%$
test_unbind_speed_stack0 0.3928ms 0.2514ms 3.9780 KOps/s 3.9222 KOps/s $\color{#35bf28}+1.42\%$
test_unbind_speed_stack1 70.1348ms 0.7267ms 1.3760 KOps/s 1.3526 KOps/s $\color{#35bf28}+1.73\%$
test_split 69.2739ms 1.5704ms 636.7961 Ops/s 621.9931 Ops/s $\color{#35bf28}+2.38\%$
test_chunk 65.9011ms 1.5743ms 635.1986 Ops/s 622.0832 Ops/s $\color{#35bf28}+2.11\%$
test_creation[device0] 0.2303ms 84.2772μs 11.8656 KOps/s 11.5548 KOps/s $\color{#35bf28}+2.69\%$
test_creation_from_tensor 4.1030ms 86.8164μs 11.5186 KOps/s 11.6636 KOps/s $\color{#d91a1a}-1.24\%$
test_add_one[memmap_tensor0] 59.5010μs 5.2517μs 190.4130 KOps/s 177.9141 KOps/s $\textbf{\color{#35bf28}+7.03\%}$
test_contiguous[memmap_tensor0] 21.4200μs 0.6359μs 1.5727 MOps/s 1.5871 MOps/s $\color{#d91a1a}-0.91\%$
test_stack[memmap_tensor0] 27.8410μs 3.5828μs 279.1149 KOps/s 273.0946 KOps/s $\color{#35bf28}+2.20\%$
test_memmaptd_index 0.9497ms 0.2511ms 3.9823 KOps/s 3.8792 KOps/s $\color{#35bf28}+2.66\%$
test_memmaptd_index_astensor 0.5868ms 0.3235ms 3.0916 KOps/s 2.9781 KOps/s $\color{#35bf28}+3.81\%$
test_memmaptd_index_op 1.4038ms 0.6146ms 1.6270 KOps/s 1.5929 KOps/s $\color{#35bf28}+2.14\%$
test_serialize_model 0.1686s 0.1135s 8.8113 Ops/s 8.5170 Ops/s $\color{#35bf28}+3.46\%$
test_serialize_model_pickle 0.4512s 0.3802s 2.6300 Ops/s 2.6487 Ops/s $\color{#d91a1a}-0.71\%$
test_serialize_weights 0.1657s 0.1100s 9.0881 Ops/s 8.9668 Ops/s $\color{#35bf28}+1.35\%$
test_serialize_weights_returnearly 0.1927s 0.1347s 7.4244 Ops/s 7.0045 Ops/s $\textbf{\color{#35bf28}+5.99\%}$
test_serialize_weights_pickle 1.2024s 0.5812s 1.7207 Ops/s 2.3605 Ops/s $\textbf{\color{#d91a1a}-27.11\%}$
test_serialize_weights_filesystem 98.1537ms 92.5202ms 10.8085 Ops/s 10.2772 Ops/s $\textbf{\color{#35bf28}+5.17\%}$
test_serialize_model_filesystem 0.1594s 98.5234ms 10.1499 Ops/s 9.9278 Ops/s $\color{#35bf28}+2.24\%$
test_reshape_pytree 53.1690μs 25.0972μs 39.8451 KOps/s 38.4393 KOps/s $\color{#35bf28}+3.66\%$
test_reshape_td 81.4620μs 33.5398μs 29.8153 KOps/s 28.8059 KOps/s $\color{#35bf28}+3.50\%$
test_view_pytree 74.4780μs 25.2981μs 39.5287 KOps/s 38.1446 KOps/s $\color{#35bf28}+3.63\%$
test_view_td 0.1012ms 37.9432μs 26.3552 KOps/s 25.2913 KOps/s $\color{#35bf28}+4.21\%$
test_unbind_pytree 80.2500μs 29.5109μs 33.8858 KOps/s 33.1061 KOps/s $\color{#35bf28}+2.36\%$
test_unbind_td 0.3510ms 36.9845μs 27.0384 KOps/s 25.8691 KOps/s $\color{#35bf28}+4.52\%$
test_split_pytree 0.1011ms 29.1079μs 34.3550 KOps/s 32.8909 KOps/s $\color{#35bf28}+4.45\%$
test_split_td 0.1204ms 39.8589μs 25.0885 KOps/s 24.4753 KOps/s $\color{#35bf28}+2.51\%$
test_add_pytree 73.0960μs 34.8261μs 28.7141 KOps/s 28.2264 KOps/s $\color{#35bf28}+1.73\%$
test_add_td 0.1331ms 57.3956μs 17.4229 KOps/s 17.1269 KOps/s $\color{#35bf28}+1.73\%$
test_distributed 0.2924ms 0.1019ms 9.8094 KOps/s 9.7693 KOps/s $\color{#35bf28}+0.41\%$
test_tdmodule 71.8740μs 18.0459μs 55.4142 KOps/s 58.0628 KOps/s $\color{#d91a1a}-4.56\%$
test_tdmodule_dispatch 92.8030μs 35.8797μs 27.8709 KOps/s 28.8995 KOps/s $\color{#d91a1a}-3.56\%$
test_tdseq 51.7070μs 21.7262μs 46.0274 KOps/s 49.5978 KOps/s $\textbf{\color{#d91a1a}-7.20\%}$
test_tdseq_dispatch 86.8720μs 41.4517μs 24.1245 KOps/s 25.2262 KOps/s $\color{#d91a1a}-4.37\%$
test_instantiation_functorch 1.4983ms 1.2941ms 772.7347 Ops/s 764.3585 Ops/s $\color{#35bf28}+1.10\%$
test_instantiation_td 1.7398ms 1.0024ms 997.6495 Ops/s 996.8255 Ops/s $\color{#35bf28}+0.08\%$
test_exec_functorch 0.2860ms 0.1577ms 6.3394 KOps/s 6.1922 KOps/s $\color{#35bf28}+2.38\%$
test_exec_functional_call 0.2484ms 0.1491ms 6.7089 KOps/s 6.6574 KOps/s $\color{#35bf28}+0.77\%$
test_exec_td 0.6962ms 0.1531ms 6.5319 KOps/s 6.7636 KOps/s $\color{#d91a1a}-3.42\%$
test_exec_td_decorator 0.8788ms 0.2216ms 4.5136 KOps/s 4.4233 KOps/s $\color{#35bf28}+2.04\%$
test_vmap_mlp_speed[True-True] 0.6583ms 0.4753ms 2.1039 KOps/s 2.0433 KOps/s $\color{#35bf28}+2.96\%$
test_vmap_mlp_speed[True-False] 0.7651ms 0.4729ms 2.1146 KOps/s 2.0474 KOps/s $\color{#35bf28}+3.28\%$
test_vmap_mlp_speed[False-True] 0.7376ms 0.3819ms 2.6187 KOps/s 2.5106 KOps/s $\color{#35bf28}+4.31\%$
test_vmap_mlp_speed[False-False] 0.7685ms 0.3821ms 2.6174 KOps/s 2.5098 KOps/s $\color{#35bf28}+4.29\%$
test_vmap_mlp_speed_decorator[True-True] 0.8235ms 0.5485ms 1.8233 KOps/s 1.7740 KOps/s $\color{#35bf28}+2.78\%$
test_vmap_mlp_speed_decorator[True-False] 0.7542ms 0.5450ms 1.8347 KOps/s 1.7855 KOps/s $\color{#35bf28}+2.76\%$
test_vmap_mlp_speed_decorator[False-True] 0.7081ms 0.4451ms 2.2468 KOps/s 2.1754 KOps/s $\color{#35bf28}+3.28\%$
test_vmap_mlp_speed_decorator[False-False] 0.6486ms 0.4454ms 2.2454 KOps/s 2.1660 KOps/s $\color{#35bf28}+3.66\%$
test_to_module_speed[True] 2.6124ms 1.7726ms 564.1485 Ops/s 586.0710 Ops/s $\color{#d91a1a}-3.74\%$
test_to_module_speed[False] 1.7536ms 1.6591ms 602.7519 Ops/s 599.2966 Ops/s $\color{#35bf28}+0.58\%$
test_tc_init 75.6120μs 31.3600μs 31.8878 KOps/s 34.7070 KOps/s $\textbf{\color{#d91a1a}-8.12\%}$
test_tc_init_nested 0.1123ms 64.1096μs 15.5983 KOps/s 16.7355 KOps/s $\textbf{\color{#d91a1a}-6.80\%}$
test_tc_first_layer_tensor 4.8907μs 0.7105μs 1.4074 MOps/s 1.4609 MOps/s $\color{#d91a1a}-3.66\%$
test_tc_first_layer_nontensor 2.6214μs 0.6980μs 1.4327 MOps/s 1.4545 MOps/s $\color{#d91a1a}-1.50\%$
test_tc_second_layer_tensor 19.6260μs 1.8532μs 539.6151 KOps/s 536.1234 KOps/s $\color{#35bf28}+0.65\%$
test_tc_second_layer_nontensor 8.1287μs 1.5723μs 635.9915 KOps/s 645.9453 KOps/s $\color{#d91a1a}-1.54\%$
test_unbind 80.9433ms 6.2793ms 159.2546 Ops/s 162.2946 Ops/s $\color{#d91a1a}-1.87\%$
test_full_like 17.2742ms 11.9492ms 83.6876 Ops/s 90.5152 Ops/s $\textbf{\color{#d91a1a}-7.54\%}$
test_zeros_like 15.7330ms 6.3220ms 158.1790 Ops/s 172.1475 Ops/s $\textbf{\color{#d91a1a}-8.11\%}$
test_ones_like 12.7897ms 6.7663ms 147.7918 Ops/s 157.0494 Ops/s $\textbf{\color{#d91a1a}-5.89\%}$
test_clone 12.9610ms 8.5016ms 117.6254 Ops/s 125.5835 Ops/s $\textbf{\color{#d91a1a}-6.34\%}$
test_squeeze 66.1030μs 13.9505μs 71.6821 KOps/s 71.6532 KOps/s $\color{#35bf28}+0.04\%$
test_unsqueeze 0.2062ms 59.5508μs 16.7924 KOps/s 16.5356 KOps/s $\color{#35bf28}+1.55\%$
test_split 0.2230ms 0.1105ms 9.0480 KOps/s 8.8717 KOps/s $\color{#35bf28}+1.99\%$
test_permute 0.2674ms 0.1265ms 7.9079 KOps/s 7.7820 KOps/s $\color{#35bf28}+1.62\%$
test_stack 27.2645ms 23.2323ms 43.0435 Ops/s 44.5649 Ops/s $\color{#d91a1a}-3.41\%$
test_cat 27.3509ms 23.3520ms 42.8228 Ops/s 44.6422 Ops/s $\color{#d91a1a}-4.08\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.3010μs 13.7771μs 72.5842 KOps/s 78.3280 KOps/s $\textbf{\color{#d91a1a}-7.33\%}$
test_plain_set_stack_nested 30.5210μs 13.9013μs 71.9356 KOps/s 77.2440 KOps/s $\textbf{\color{#d91a1a}-6.87\%}$
test_plain_set_nested_inplace 35.4910μs 15.1525μs 65.9957 KOps/s 71.2368 KOps/s $\textbf{\color{#d91a1a}-7.36\%}$
test_plain_set_stack_nested_inplace 61.4610μs 15.2376μs 65.6269 KOps/s 70.5036 KOps/s $\textbf{\color{#d91a1a}-6.92\%}$
test_items 19.2610μs 4.7791μs 209.2439 KOps/s 209.2259 KOps/s $+0.01\%$
test_items_nested 0.4308ms 0.3403ms 2.9383 KOps/s 2.9591 KOps/s $\color{#d91a1a}-0.70\%$
test_items_nested_locked 0.3866ms 0.3416ms 2.9273 KOps/s 2.9243 KOps/s $\color{#35bf28}+0.10\%$
test_items_nested_leaf 0.1033ms 83.5737μs 11.9655 KOps/s 12.1516 KOps/s $\color{#d91a1a}-1.53\%$
test_items_stack_nested 0.5318ms 0.3430ms 2.9159 KOps/s 2.9385 KOps/s $\color{#d91a1a}-0.77\%$
test_items_stack_nested_leaf 99.5920μs 83.5017μs 11.9758 KOps/s 12.1213 KOps/s $\color{#d91a1a}-1.20\%$
test_items_stack_nested_locked 0.3725ms 0.3419ms 2.9248 KOps/s 2.9477 KOps/s $\color{#d91a1a}-0.78\%$
test_keys 25.2110μs 4.3470μs 230.0435 KOps/s 229.7387 KOps/s $\color{#35bf28}+0.13\%$
test_keys_nested 91.6820μs 66.8643μs 14.9557 KOps/s 14.8950 KOps/s $\color{#35bf28}+0.41\%$
test_keys_nested_locked 2.2496ms 72.2957μs 13.8321 KOps/s 13.8631 KOps/s $\color{#d91a1a}-0.22\%$
test_keys_nested_leaf 83.2920μs 57.1716μs 17.4912 KOps/s 17.2022 KOps/s $\color{#35bf28}+1.68\%$
test_keys_stack_nested 0.1131ms 66.2197μs 15.1012 KOps/s 15.0742 KOps/s $\color{#35bf28}+0.18\%$
test_keys_stack_nested_leaf 79.6610μs 57.0506μs 17.5283 KOps/s 17.2947 KOps/s $\color{#35bf28}+1.35\%$
test_keys_stack_nested_locked 95.9010μs 71.5343μs 13.9793 KOps/s 14.0966 KOps/s $\color{#d91a1a}-0.83\%$
test_values 7.4070μs 1.8148μs 551.0261 KOps/s 553.3404 KOps/s $\color{#d91a1a}-0.42\%$
test_values_nested 50.7010μs 35.4368μs 28.2193 KOps/s 28.2545 KOps/s $\color{#d91a1a}-0.12\%$
test_values_nested_locked 59.1020μs 37.4454μs 26.7055 KOps/s 26.7782 KOps/s $\color{#d91a1a}-0.27\%$
test_values_nested_leaf 51.9700μs 31.7189μs 31.5269 KOps/s 31.6153 KOps/s $\color{#d91a1a}-0.28\%$
test_values_stack_nested 60.6310μs 36.4383μs 27.4437 KOps/s 27.9365 KOps/s $\color{#d91a1a}-1.76\%$
test_values_stack_nested_leaf 49.2110μs 32.3127μs 30.9476 KOps/s 31.1785 KOps/s $\color{#d91a1a}-0.74\%$
test_values_stack_nested_locked 56.0210μs 38.3291μs 26.0898 KOps/s 26.3048 KOps/s $\color{#d91a1a}-0.82\%$
test_membership 3.8716μs 0.7310μs 1.3680 MOps/s 1.4171 MOps/s $\color{#d91a1a}-3.46\%$
test_membership_nested 20.5410μs 2.5982μs 384.8804 KOps/s 390.4445 KOps/s $\color{#d91a1a}-1.43\%$
test_membership_nested_leaf 22.7600μs 2.5619μs 390.3279 KOps/s 384.7737 KOps/s $\color{#35bf28}+1.44\%$
test_membership_stacked_nested 15.7310μs 2.5745μs 388.4230 KOps/s 393.8321 KOps/s $\color{#d91a1a}-1.37\%$
test_membership_stacked_nested_leaf 15.9810μs 2.5703μs 389.0651 KOps/s 384.2105 KOps/s $\color{#35bf28}+1.26\%$
test_membership_nested_last 23.1500μs 3.1243μs 320.0744 KOps/s 321.7743 KOps/s $\color{#d91a1a}-0.53\%$
test_membership_nested_leaf_last 18.6410μs 3.1427μs 318.2024 KOps/s 323.9699 KOps/s $\color{#d91a1a}-1.78\%$
test_membership_stacked_nested_last 33.2200μs 8.6121μs 116.1158 KOps/s 101.6346 KOps/s $\textbf{\color{#35bf28}+14.25\%}$
test_membership_stacked_nested_leaf_last 74.5210μs 8.6380μs 115.7679 KOps/s 102.1310 KOps/s $\textbf{\color{#35bf28}+13.35\%}$
test_nested_getleaf 26.1010μs 8.3552μs 119.6864 KOps/s 120.1259 KOps/s $\color{#d91a1a}-0.37\%$
test_nested_get 23.8010μs 7.9021μs 126.5488 KOps/s 127.0957 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_getleaf 36.8700μs 8.4070μs 118.9484 KOps/s 119.1030 KOps/s $\color{#d91a1a}-0.13\%$
test_stacked_get 20.9310μs 7.8893μs 126.7545 KOps/s 126.4835 KOps/s $\color{#35bf28}+0.21\%$
test_nested_getitemleaf 25.2810μs 8.5860μs 116.4683 KOps/s 116.1301 KOps/s $\color{#35bf28}+0.29\%$
test_nested_getitem 31.0900μs 8.0538μs 124.1649 KOps/s 124.1784 KOps/s $\color{#d91a1a}-0.01\%$
test_stacked_getitemleaf 26.0300μs 8.5799μs 116.5514 KOps/s 115.8915 KOps/s $\color{#35bf28}+0.57\%$
test_stacked_getitem 0.1769ms 8.0345μs 124.4632 KOps/s 124.2574 KOps/s $\color{#35bf28}+0.17\%$
test_lock_nested 58.2538ms 0.4020ms 2.4877 KOps/s 2.4913 KOps/s $\color{#d91a1a}-0.14\%$
test_lock_stack_nested 0.3358ms 0.2934ms 3.4085 KOps/s 3.3895 KOps/s $\color{#35bf28}+0.56\%$
test_unlock_nested 60.5393ms 0.4054ms 2.4665 KOps/s 2.4521 KOps/s $\color{#35bf28}+0.59\%$
test_unlock_stack_nested 0.3324ms 0.3024ms 3.3063 KOps/s 3.2735 KOps/s $\color{#35bf28}+1.00\%$
test_flatten_speed 0.3738ms 0.1024ms 9.7696 KOps/s 9.8035 KOps/s $\color{#d91a1a}-0.35\%$
test_unflatten_speed 0.3560ms 0.2903ms 3.4447 KOps/s 3.3909 KOps/s $\color{#35bf28}+1.58\%$
test_common_ops 1.0844ms 0.6094ms 1.6409 KOps/s 1.7007 KOps/s $\color{#d91a1a}-3.51\%$
test_creation 16.9510μs 1.6292μs 613.7861 KOps/s 601.4554 KOps/s $\color{#35bf28}+2.05\%$
test_creation_empty 29.3300μs 10.6390μs 93.9936 KOps/s 115.4282 KOps/s $\textbf{\color{#d91a1a}-18.57\%}$
test_creation_nested_1 31.7700μs 12.5310μs 79.8022 KOps/s 94.7583 KOps/s $\textbf{\color{#d91a1a}-15.78\%}$
test_creation_nested_2 33.3710μs 14.6086μs 68.4529 KOps/s 79.8047 KOps/s $\textbf{\color{#d91a1a}-14.22\%}$
test_clone 68.0410μs 11.4433μs 87.3872 KOps/s 84.2176 KOps/s $\color{#35bf28}+3.76\%$
test_getitem[int] 50.7810μs 10.6804μs 93.6297 KOps/s 92.1566 KOps/s $\color{#35bf28}+1.60\%$
test_getitem[slice_int] 0.1094ms 20.0353μs 49.9120 KOps/s 47.7197 KOps/s $\color{#35bf28}+4.59\%$
test_getitem[range] 62.7610μs 45.0126μs 22.2160 KOps/s 21.6062 KOps/s $\color{#35bf28}+2.82\%$
test_getitem[tuple] 0.1991ms 18.1989μs 54.9484 KOps/s 53.5271 KOps/s $\color{#35bf28}+2.66\%$
test_getitem[list] 0.1378ms 33.0066μs 30.2970 KOps/s 30.5464 KOps/s $\color{#d91a1a}-0.82\%$
test_setitem_dim[int] 52.9710μs 31.4452μs 31.8014 KOps/s 33.5053 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_setitem_dim[slice_int] 71.9610μs 51.3511μs 19.4738 KOps/s 20.0348 KOps/s $\color{#d91a1a}-2.80\%$
test_setitem_dim[range] 87.1810μs 68.5741μs 14.5828 KOps/s 14.9809 KOps/s $\color{#d91a1a}-2.66\%$
test_setitem_dim[tuple] 68.1410μs 46.0145μs 21.7323 KOps/s 22.6784 KOps/s $\color{#d91a1a}-4.17\%$
test_setitem 50.6010μs 17.4261μs 57.3852 KOps/s 60.0696 KOps/s $\color{#d91a1a}-4.47\%$
test_set 45.4810μs 16.6554μs 60.0407 KOps/s 61.3394 KOps/s $\color{#d91a1a}-2.12\%$
test_set_shared 1.6002ms 97.9914μs 10.2050 KOps/s 10.1488 KOps/s $\color{#35bf28}+0.55\%$
test_update 0.1049ms 20.0917μs 49.7719 KOps/s 52.8421 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_update_nested 0.1986ms 25.9920μs 38.4733 KOps/s 40.8801 KOps/s $\textbf{\color{#d91a1a}-5.89\%}$
test_update__nested 65.7710μs 21.8941μs 45.6744 KOps/s 44.7196 KOps/s $\color{#35bf28}+2.14\%$
test_set_nested 0.1231ms 17.6912μs 56.5252 KOps/s 57.8917 KOps/s $\color{#d91a1a}-2.36\%$
test_set_nested_new 64.7710μs 20.9208μs 47.7994 KOps/s 49.5430 KOps/s $\color{#d91a1a}-3.52\%$
test_select 0.2118ms 35.7488μs 27.9730 KOps/s 29.7809 KOps/s $\textbf{\color{#d91a1a}-6.07\%}$
test_select_nested 1.0314ms 54.6506μs 18.2981 KOps/s 18.2892 KOps/s $\color{#35bf28}+0.05\%$
test_exclude_nested 0.1714ms 0.1113ms 8.9846 KOps/s 9.0197 KOps/s $\color{#d91a1a}-0.39\%$
test_empty[True] 0.3838ms 0.3455ms 2.8942 KOps/s 2.8910 KOps/s $\color{#35bf28}+0.11\%$
test_empty[False] 2.9650μs 0.9095μs 1.0995 MOps/s 1.0692 MOps/s $\color{#35bf28}+2.83\%$
test_to 0.1040ms 75.2289μs 13.2928 KOps/s 12.7655 KOps/s $\color{#35bf28}+4.13\%$
test_to_nonblocking 0.2110ms 63.0616μs 15.8575 KOps/s 15.7030 KOps/s $\color{#35bf28}+0.98\%$
test_unbind_speed 0.3711ms 0.2576ms 3.8822 KOps/s 3.8135 KOps/s $\color{#35bf28}+1.80\%$
test_unbind_speed_stack0 0.3053ms 0.2582ms 3.8730 KOps/s 3.8181 KOps/s $\color{#35bf28}+1.44\%$
test_unbind_speed_stack1 75.3448ms 0.7919ms 1.2629 KOps/s 1.2521 KOps/s $\color{#35bf28}+0.86\%$
test_split 76.0242ms 1.6322ms 612.6760 Ops/s 597.6822 Ops/s $\color{#35bf28}+2.51\%$
test_chunk 75.7680ms 1.6282ms 614.1894 Ops/s 643.2406 Ops/s $\color{#d91a1a}-4.52\%$
test_creation[device0] 0.1324ms 56.6215μs 17.6611 KOps/s 17.5482 KOps/s $\color{#35bf28}+0.64\%$
test_creation_from_tensor 0.1372ms 57.1049μs 17.5116 KOps/s 18.0527 KOps/s $\color{#d91a1a}-3.00\%$
test_add_one[memmap_tensor0] 76.6110μs 6.6970μs 149.3216 KOps/s 143.9983 KOps/s $\color{#35bf28}+3.70\%$
test_contiguous[memmap_tensor0] 16.4000μs 0.6603μs 1.5145 MOps/s 1.5044 MOps/s $\color{#35bf28}+0.67\%$
test_stack[memmap_tensor0] 19.2800μs 4.6201μs 216.4459 KOps/s 213.4756 KOps/s $\color{#35bf28}+1.39\%$
test_memmaptd_index 1.0359ms 0.2808ms 3.5615 KOps/s 3.5410 KOps/s $\color{#35bf28}+0.58\%$
test_memmaptd_index_astensor 0.6368ms 0.3535ms 2.8291 KOps/s 2.8075 KOps/s $\color{#35bf28}+0.77\%$
test_memmaptd_index_op 1.1485ms 0.6806ms 1.4693 KOps/s 1.4106 KOps/s $\color{#35bf28}+4.17\%$
test_serialize_model 0.1811s 0.1098s 9.1076 Ops/s 9.5502 Ops/s $\color{#d91a1a}-4.63\%$
test_serialize_model_pickle 1.4759s 1.2541s 0.7974 Ops/s 0.8073 Ops/s $\color{#d91a1a}-1.22\%$
test_serialize_weights 0.1799s 0.1081s 9.2524 Ops/s 9.6249 Ops/s $\color{#d91a1a}-3.87\%$
test_serialize_weights_returnearly 0.2624s 98.6974ms 10.1320 Ops/s 12.2891 Ops/s $\textbf{\color{#d91a1a}-17.55\%}$
test_serialize_weights_pickle 1.3491s 1.2477s 0.8015 Ops/s 0.8011 Ops/s $\color{#35bf28}+0.05\%$
test_reshape_pytree 54.6800μs 25.5578μs 39.1270 KOps/s 38.4984 KOps/s $\color{#35bf28}+1.63\%$
test_reshape_td 66.7710μs 30.3610μs 32.9370 KOps/s 32.0732 KOps/s $\color{#35bf28}+2.69\%$
test_view_pytree 0.2461ms 25.3880μs 39.3887 KOps/s 38.5466 KOps/s $\color{#35bf28}+2.18\%$
test_view_td 0.2414ms 35.1192μs 28.4744 KOps/s 27.5197 KOps/s $\color{#35bf28}+3.47\%$
test_unbind_pytree 0.2335ms 31.4844μs 31.7617 KOps/s 31.4378 KOps/s $\color{#35bf28}+1.03\%$
test_unbind_td 0.5173ms 39.6990μs 25.1896 KOps/s 23.6843 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_split_pytree 0.2525ms 34.0559μs 29.3635 KOps/s 28.4331 KOps/s $\color{#35bf28}+3.27\%$
test_split_td 0.1224ms 38.3390μs 26.0831 KOps/s 24.9560 KOps/s $\color{#35bf28}+4.52\%$
test_add_pytree 79.3210μs 37.1063μs 26.9496 KOps/s 25.8194 KOps/s $\color{#35bf28}+4.38\%$
test_add_td 0.1552ms 54.0068μs 18.5162 KOps/s 19.2816 KOps/s $\color{#d91a1a}-3.97\%$
test_distributed 2.3569ms 68.3445μs 14.6317 KOps/s 11.0912 KOps/s $\textbf{\color{#35bf28}+31.92\%}$
test_tdmodule 0.1532ms 16.0610μs 62.2626 KOps/s 68.0084 KOps/s $\textbf{\color{#d91a1a}-8.45\%}$
test_tdmodule_dispatch 57.6010μs 30.9139μs 32.3479 KOps/s 34.9468 KOps/s $\textbf{\color{#d91a1a}-7.44\%}$
test_tdseq 35.2410μs 17.6572μs 56.6342 KOps/s 59.9093 KOps/s $\textbf{\color{#d91a1a}-5.47\%}$
test_tdseq_dispatch 52.8210μs 34.4226μs 29.0507 KOps/s 30.9929 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_instantiation_functorch 1.6043ms 1.5242ms 656.0973 Ops/s 641.6219 Ops/s $\color{#35bf28}+2.26\%$
test_instantiation_td 1.5397ms 1.0420ms 959.7181 Ops/s 949.8515 Ops/s $\color{#35bf28}+1.04\%$
test_exec_functorch 0.1807ms 0.1464ms 6.8287 KOps/s 6.8289 KOps/s $-0.00\%$
test_exec_functional_call 0.1951ms 0.1330ms 7.5202 KOps/s 7.5589 KOps/s $\color{#d91a1a}-0.51\%$
test_exec_td 0.2243ms 0.1311ms 7.6258 KOps/s 7.5535 KOps/s $\color{#35bf28}+0.96\%$
test_exec_td_decorator 0.5015ms 0.2047ms 4.8850 KOps/s 4.8224 KOps/s $\color{#35bf28}+1.30\%$
test_vmap_mlp_speed[True-True] 0.6991ms 0.5508ms 1.8154 KOps/s 1.7876 KOps/s $\color{#35bf28}+1.56\%$
test_vmap_mlp_speed[True-False] 0.7495ms 0.5803ms 1.7233 KOps/s 1.7977 KOps/s $\color{#d91a1a}-4.14\%$
test_vmap_mlp_speed[False-True] 0.6765ms 0.5055ms 1.9784 KOps/s 2.0447 KOps/s $\color{#d91a1a}-3.24\%$
test_vmap_mlp_speed[False-False] 0.6830ms 0.5052ms 1.9796 KOps/s 2.0432 KOps/s $\color{#d91a1a}-3.11\%$
test_vmap_mlp_speed_decorator[True-True] 1.3163ms 0.6148ms 1.6265 KOps/s 1.5220 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_vmap_mlp_speed_decorator[True-False] 0.7784ms 0.6127ms 1.6321 KOps/s 1.5646 KOps/s $\color{#35bf28}+4.31\%$
test_vmap_mlp_speed_decorator[False-True] 0.7519ms 0.5498ms 1.8189 KOps/s 1.7489 KOps/s $\color{#35bf28}+4.00\%$
test_vmap_mlp_speed_decorator[False-False] 0.7518ms 0.5543ms 1.8042 KOps/s 1.7625 KOps/s $\color{#35bf28}+2.36\%$
test_vmap_transformer_speed[True-True] 7.4944ms 7.1757ms 139.3586 Ops/s 136.8322 Ops/s $\color{#35bf28}+1.85\%$
test_vmap_transformer_speed[True-False] 7.3377ms 7.1712ms 139.4470 Ops/s 136.1675 Ops/s $\color{#35bf28}+2.41\%$
test_vmap_transformer_speed[False-True] 7.4365ms 7.0935ms 140.9733 Ops/s 137.3083 Ops/s $\color{#35bf28}+2.67\%$
test_vmap_transformer_speed[False-False] 7.2691ms 7.0958ms 140.9278 Ops/s 137.8661 Ops/s $\color{#35bf28}+2.22\%$
test_vmap_transformer_speed_decorator[True-True] 18.2442ms 17.4946ms 57.1605 Ops/s 56.1402 Ops/s $\color{#35bf28}+1.82\%$
test_vmap_transformer_speed_decorator[True-False] 17.5715ms 17.4526ms 57.2982 Ops/s 56.2050 Ops/s $\color{#35bf28}+1.95\%$
test_vmap_transformer_speed_decorator[False-True] 18.2248ms 17.3991ms 57.4741 Ops/s 56.0375 Ops/s $\color{#35bf28}+2.56\%$
test_vmap_transformer_speed_decorator[False-False] 18.2171ms 17.3519ms 57.6307 Ops/s 56.3376 Ops/s $\color{#35bf28}+2.30\%$
test_to_module_speed[True] 2.1228ms 1.5406ms 649.1095 Ops/s 639.6198 Ops/s $\color{#35bf28}+1.48\%$
test_to_module_speed[False] 1.6070ms 1.5097ms 662.3790 Ops/s 651.3577 Ops/s $\color{#35bf28}+1.69\%$
test_tc_init 49.8110μs 29.6128μs 33.7692 KOps/s 40.3028 KOps/s $\textbf{\color{#d91a1a}-16.21\%}$
test_tc_init_nested 0.2443ms 61.5877μs 16.2370 KOps/s 19.0397 KOps/s $\textbf{\color{#d91a1a}-14.72\%}$
test_tc_first_layer_tensor 0.7728μs 0.3603μs 2.7754 MOps/s 2.7919 MOps/s $\color{#d91a1a}-0.59\%$
test_tc_first_layer_nontensor 1.5139μs 0.3902μs 2.5628 MOps/s 2.5647 MOps/s $\color{#d91a1a}-0.08\%$
test_tc_second_layer_tensor 16.3210μs 1.0846μs 922.0095 KOps/s 1.0136 MOps/s $\textbf{\color{#d91a1a}-9.03\%}$
test_tc_second_layer_nontensor 2.0000μs 0.8037μs 1.2442 MOps/s 1.2222 MOps/s $\color{#35bf28}+1.81\%$
test_unbind 0.1052s 6.3788ms 156.7697 Ops/s 128.4100 Ops/s $\textbf{\color{#35bf28}+22.09\%}$
test_full_like 13.8634ms 13.2833ms 75.2824 Ops/s 76.1651 Ops/s $\color{#d91a1a}-1.16\%$
test_zeros_like 8.2209ms 7.7940ms 128.3037 Ops/s 127.2560 Ops/s $\color{#35bf28}+0.82\%$
test_ones_like 8.4964ms 7.7951ms 128.2861 Ops/s 127.1861 Ops/s $\color{#35bf28}+0.86\%$
test_clone 9.4998ms 9.2983ms 107.5471 Ops/s 107.7945 Ops/s $\color{#d91a1a}-0.23\%$
test_squeeze 59.4110μs 10.6966μs 93.4875 KOps/s 90.4613 KOps/s $\color{#35bf28}+3.35\%$
test_unsqueeze 93.9610μs 50.5184μs 19.7947 KOps/s 19.4987 KOps/s $\color{#35bf28}+1.52\%$
test_split 0.1999ms 94.5933μs 10.5716 KOps/s 10.3543 KOps/s $\color{#35bf28}+2.10\%$
test_permute 0.2242ms 0.1075ms 9.3029 KOps/s 9.0629 KOps/s $\color{#35bf28}+2.65\%$
test_stack 27.0651ms 26.8280ms 37.2745 Ops/s 37.1783 Ops/s $\color{#35bf28}+0.26\%$
test_cat 27.0262ms 26.7653ms 37.3618 Ops/s 37.2988 Ops/s $\color{#35bf28}+0.17\%$

@vmoens vmoens merged commit aa1fa9e into main Jun 19, 2024
36 of 38 checks passed
@vmoens vmoens deleted the fix-tdmodule-nontensor branch June 19, 2024 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] TensorDictModule with single NonTensorStack fails
2 participants